Research Question¶
How do funding disparities suggest an underrepresentation of Black and Hispanic youth in traditional U.S public schools in high-cost sports?
Problem Statement¶
Sports specialization involves year-round training and competition, and requires costly investments towards participation, travel, and equipment fees, which creates significant finanicial barriers for youth from lower socioeconomic backgrounds. Aside from this, public school funding disparities can limit access to appropriate facilities, personnel, or physical education, which could further hinder sports participation opportunities for youth in lower SES communities. These disparities can contribute to underrepresentation of Black or Hispanic youth in sports with high financial barriers -- hockey, gymnastics, tennis, etc., while sports such as track and field are less expensive, and therefore more accessible.
Potential Subtopics¶
- Correlation between public school funding and facility quality
- Connection between SES and physical activity/education
Data Definition¶
Public School Characteristics 2022-23
Last Updated: October 21, 2024
https://catalog.data.gov/dataset/public-school-characteristics-2022-23-451db
The National Center for Education Statistics (NCES) gathers demographic and geographic data about U.S public schools and factors such as enrollment and Title I status. Further information consists of the percentage of students with free or reduced lunch eligibility. By researching both this dataset and the YRBSS, researchers could analyze patterns between students or schools with a lower SES and the rates of physical activity rates.
Additional Datasets of Interest¶
Nutrition, Physical Activity, and Obesity - Youth Risk Behavior Surveillance System
Last Updated: February 4, 2025
Conducted by the Centers for Disease Control and Prevention (CDC), the Youth Risk Behavior Surveillance System (YRBSS) monitors health behaviors in middle and high school students nationwide. It collects data regarding physical activity and nutrition, along with geographic and socioeconomic factors. By collecting this data, it could be used to further research on the impact socioeconomic factors have on health behaviors.
Data Collection¶
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
import warnings
warnings.filterwarnings('ignore')
Read the Data¶
path = pd.read_csv('Public_School_Characteristics_2022-23.csv')
psChar_23 = pd.DataFrame(path)
psChar_23.head(7)
| X | Y | OBJECTID | NCESSCH | SURVYEAR | STABR | LEAID | ST_LEAID | LEA_NAME | SCH_NAME | LSTREET1 | LSTREET2 | LCITY | LSTATE | LZIP | LZIP4 | PHONE | CHARTER_TEXT | VIRTUAL | GSLO | GSHI | SCHOOL_LEVEL | STATUS | SCHOOL_TYPE_TEXT | SY_STATUS_TEXT | ULOCALE | NMCNTY | TOTFRL | FRELCH | REDLCH | DIRECTCERT | PK | KG | G01 | G02 | G03 | G04 | G05 | G06 | G07 | G08 | G09 | G10 | G11 | G12 | G13 | UG | AE | TOTMENROL | TOTFENROL | TOTAL | MEMBER | FTE | STUTERATIO | AMALM | AMALF | AM | ASALM | ASALF | AS | BLALM | BLALF | BL | HPALM | HPALF | HP | HIALM | HIALF | HI | TRALM | TRALF | TR | WHALM | WHALF | WH | LATCOD | LONCOD | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | -86.206200 | 34.26020 | 1 | 10000500870 | 2022-2023 | AL | 100005 | AL-101 | Albertville City | Albertville Middle School | 600 E Alabama Ave | NaN | Albertville | AL | 35950 | (256)878-2341 | No | Not Virtual | 07 | 08 | Middle | 1 | Regular School | Currently operational | 32-Town: Distant | Marshall County | 697 | 654 | 43 | 587 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 440.0 | 450.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 459.0 | 431.0 | 890.0 | 890.0 | 45.000000 | 19.78 | 4.0 | 1.0 | 5.0 | 4.0 | 2.0 | 6.0 | 15.0 | 14.0 | 29.0 | 0.0 | 1.0 | 1.0 | 251.0 | 251.0 | 502.0 | 17.0 | 15.0 | 32.0 | 168.0 | 147.0 | 315.0 | 34.26020 | -86.206200 | |
| 1 | -86.204900 | 34.26220 | 2 | 10000500871 | 2022-2023 | AL | 100005 | AL-101 | Albertville City | Albertville High School | 402 E McCord Ave | NaN | Albertville | AL | 35950 | 2322 | (256)894-5000 | No | Not Virtual | 09 | 12 | High | 1 | Regular School | Currently operational | 32-Town: Distant | Marshall County | 1254 | 1178 | 76 | 1059 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 493.0 | 442.0 | 390.0 | 387.0 | NaN | NaN | NaN | 868.0 | 844.0 | 1712.0 | 1712.0 | 85.199997 | 20.09 | 0.0 | 2.0 | 2.0 | 4.0 | 5.0 | 9.0 | 23.0 | 34.0 | 57.0 | 0.0 | 0.0 | 0.0 | 490.0 | 468.0 | 958.0 | 26.0 | 19.0 | 45.0 | 325.0 | 316.0 | 641.0 | 34.26220 | -86.204900 |
| 2 | -86.220100 | 34.27330 | 3 | 10000500879 | 2022-2023 | AL | 100005 | AL-101 | Albertville City | Albertville Intermediate School | 901 W McKinney Ave | NaN | Albertville | AL | 35950 | 1300 | (256)878-7698 | No | Not Virtual | 05 | 06 | Middle | 1 | Regular School | Currently operational | 32-Town: Distant | Marshall County | 718 | 665 | 53 | 570 | NaN | NaN | NaN | NaN | NaN | NaN | 412.0 | 462.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 451.0 | 423.0 | 874.0 | 874.0 | 43.000000 | 20.33 | 1.0 | 4.0 | 5.0 | 4.0 | 0.0 | 4.0 | 22.0 | 28.0 | 50.0 | 0.0 | 0.0 | 0.0 | 263.0 | 241.0 | 504.0 | 7.0 | 6.0 | 13.0 | 154.0 | 144.0 | 298.0 | 34.27330 | -86.220100 |
| 3 | -86.221806 | 34.25270 | 4 | 10000500889 | 2022-2023 | AL | 100005 | AL-101 | Albertville City | Albertville Elementary School | 145 West End Drive | NaN | Albertville | AL | 35950 | (256)894-4822 | No | Not Virtual | 03 | 04 | Elementary | 1 | Regular School | Currently operational | 32-Town: Distant | Marshall County | 723 | 680 | 43 | 583 | NaN | NaN | NaN | NaN | 430.0 | 444.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 463.0 | 411.0 | 874.0 | 874.0 | 43.000000 | 20.33 | 0.0 | 4.0 | 4.0 | 1.0 | 3.0 | 4.0 | 22.0 | 16.0 | 38.0 | 0.0 | 0.0 | 0.0 | 261.0 | 236.0 | 497.0 | 11.0 | 16.0 | 27.0 | 168.0 | 136.0 | 304.0 | 34.25270 | -86.221806 | |
| 4 | -86.193300 | 34.28980 | 5 | 10000501616 | 2022-2023 | AL | 100005 | AL-101 | Albertville City | Albertville Kindergarten and PreK | 257 Country Club Rd | NaN | Albertville | AL | 35951 | 3927 | (256)878-7922 | No | Not Virtual | PK | KG | Elementary | 1 | Regular School | Currently operational | 32-Town: Distant | Marshall County | 392 | 367 | 25 | 240 | 133.0 | 473.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 304.0 | 302.0 | 606.0 | 606.0 | 26.000000 | 23.31 | 1.0 | 3.0 | 4.0 | 2.0 | 0.0 | 2.0 | 26.0 | 23.0 | 49.0 | 0.0 | 0.0 | 0.0 | 167.0 | 152.0 | 319.0 | 4.0 | 4.0 | 8.0 | 104.0 | 120.0 | 224.0 | 34.28980 | -86.193300 |
| 5 | -86.221800 | 34.25330 | 6 | 10000502150 | 2022-2023 | AL | 100005 | AL-101 | Albertville City | Albertville Primary School | 1100 Horton Rd | NaN | Albertville | AL | 35950 | 2532 | (256)878-6611 | No | Not Virtual | 01 | 02 | Elementary | 1 | Regular School | Currently operational | 32-Town: Distant | Marshall County | 779 | 726 | 53 | 617 | 0.0 | NaN | 427.0 | 517.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 498.0 | 446.0 | 944.0 | 944.0 | 61.000000 | 15.48 | 9.0 | 1.0 | 10.0 | 3.0 | 0.0 | 3.0 | 24.0 | 21.0 | 45.0 | 0.0 | 1.0 | 1.0 | 290.0 | 256.0 | 546.0 | 9.0 | 10.0 | 19.0 | 163.0 | 157.0 | 320.0 | 34.25330 | -86.221800 |
| 6 | -86.254153 | 34.53375 | 7 | 10000600193 | 2022-2023 | AL | 100006 | AL-048 | Marshall County | Kate Duncan Smith DAR Middle | 6077 Main St | NaN | Grant | AL | 35747 | (256)728-5950 | No | Not Virtual | 05 | 08 | Middle | 1 | Regular School | Currently operational | 42-Rural: Distant | Marshall County | 151 | 123 | 28 | 194 | NaN | NaN | NaN | NaN | NaN | NaN | 95.0 | 97.0 | 86.0 | 86.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 192.0 | 172.0 | 364.0 | 364.0 | 22.030001 | 16.52 | 1.0 | 3.0 | 4.0 | 0.0 | 0.0 | 0.0 | 2.0 | 0.0 | 2.0 | 0.0 | 0.0 | 0.0 | 6.0 | 8.0 | 14.0 | 5.0 | 9.0 | 14.0 | 178.0 | 152.0 | 330.0 | 34.53375 | -86.254153 |
psChar_23.tail(7)
| X | Y | OBJECTID | NCESSCH | SURVYEAR | STABR | LEAID | ST_LEAID | LEA_NAME | SCH_NAME | LSTREET1 | LSTREET2 | LCITY | LSTATE | LZIP | LZIP4 | PHONE | CHARTER_TEXT | VIRTUAL | GSLO | GSHI | SCHOOL_LEVEL | STATUS | SCHOOL_TYPE_TEXT | SY_STATUS_TEXT | ULOCALE | NMCNTY | TOTFRL | FRELCH | REDLCH | DIRECTCERT | PK | KG | G01 | G02 | G03 | G04 | G05 | G06 | G07 | G08 | G09 | G10 | G11 | G12 | G13 | UG | AE | TOTMENROL | TOTFENROL | TOTAL | MEMBER | FTE | STUTERATIO | AMALM | AMALF | AM | ASALM | ASALF | AS | BLALM | BLALF | BL | HPALM | HPALF | HP | HIALM | HIALF | HI | TRALM | TRALF | TR | WHALM | WHALF | WH | LATCOD | LONCOD | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 101383 | -64.932456 | 18.352146 | 101384 | 780003000020 | 2022-2023 | VI | 7800030 | VI-001 | Saint Thomas - Saint John School District | JOSEPH SIBILLY ELEMENTARY SCHOOL | 14 15 16 ESTATE ELIZABETH | NaN | Saint Thomas | VI | 802 | (340)774-7001 | N | Not Virtual | PK | 06 | Elementary | 1 | Regular School | Currently operational | 33-Town: Remote | St. Thomas Island | 228 | 228 | 0 | -1 | 19.0 | 25.0 | 25.0 | 25.0 | 31.0 | 34.0 | 34.0 | 38.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 121.0 | 110.0 | 231.0 | 231.0 | 16.0 | 14.44 | 0.0 | 0.0 | 0.0 | 2.0 | 2.0 | 4.0 | 99.0 | 93.0 | 192.0 | 0.0 | 0.0 | 0.0 | 8.0 | 5.0 | 13.0 | 2.0 | 1.0 | 3.0 | 10.0 | 9.0 | 19.0 | 18.352146 | -64.932456 | |
| 101384 | -64.793916 | 18.330464 | 101385 | 780003000022 | 2022-2023 | VI | 7800030 | VI-001 | Saint Thomas - Saint John School District | JULIUS E SPRAUVE | 14 18 ESTATE ENIGHED | NaN | Saint John | VI | 831 | (340)776-6336 | N | Not Virtual | PK | 08 | Elementary | 1 | Regular School | Currently operational | 33-Town: Remote | St. John Island | 199 | 199 | 0 | -1 | 8.0 | 21.0 | 16.0 | 21.0 | 14.0 | 24.0 | 20.0 | 26.0 | 27.0 | 25.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 103.0 | 99.0 | 202.0 | 202.0 | 20.0 | 10.10 | 1.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 79.0 | 68.0 | 147.0 | 0.0 | 0.0 | 0.0 | 22.0 | 29.0 | 51.0 | 0.0 | 0.0 | 0.0 | 1.0 | 2.0 | 3.0 | 18.330464 | -64.793916 | |
| 101385 | -64.917602 | 18.341950 | 101386 | 780003000024 | 2022-2023 | VI | 7800030 | VI-001 | Saint Thomas - Saint John School District | LOCKHART ELEMENTARY SCHOOL | 41 ESTATE THOMAS | NaN | Saint Thomas | VI | 802 | (340)775-0820 | N | Not Virtual | KG | 03 | Elementary | 1 | Regular School | Currently operational | 33-Town: Remote | St. Thomas Island | 295 | 295 | 0 | -1 | NaN | 77.0 | 75.0 | 69.0 | 77.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 171.0 | 127.0 | 298.0 | 298.0 | 18.0 | 16.56 | 0.0 | 0.0 | 0.0 | 4.0 | 3.0 | 7.0 | 132.0 | 92.0 | 224.0 | 0.0 | 0.0 | 0.0 | 33.0 | 30.0 | 63.0 | 1.0 | 2.0 | 3.0 | 1.0 | 0.0 | 1.0 | 18.341950 | -64.917602 | |
| 101386 | -64.952483 | 18.338742 | 101387 | 780003000026 | 2022-2023 | VI | 7800030 | VI-001 | Saint Thomas - Saint John School District | ULLA F MULLER ELEMENTARY SCHOOL | 7B ESTATE CONTANT | NaN | Saint Thomas | VI | 802 | (340)774-0059 | N | Not Virtual | KG | 06 | Elementary | 1 | Regular School | Currently operational | 33-Town: Remote | St. Thomas Island | 417 | 417 | 0 | -1 | NaN | 52.0 | 53.0 | 51.0 | 47.0 | 70.0 | 79.0 | 68.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 200.0 | 220.0 | 420.0 | 420.0 | 28.0 | 15.00 | 0.0 | 2.0 | 2.0 | 2.0 | 4.0 | 6.0 | 167.0 | 182.0 | 349.0 | 0.0 | 0.0 | 0.0 | 27.0 | 27.0 | 54.0 | 2.0 | 0.0 | 2.0 | 2.0 | 5.0 | 7.0 | 18.338742 | -64.952483 | |
| 101387 | -64.899024 | 18.354782 | 101388 | 780003000027 | 2022-2023 | VI | 7800030 | VI-001 | Saint Thomas - Saint John School District | YVONNE BOWSKY ELEMENTARY SCHOOL | 15B and 16 ESTATE MANDAHL | NaN | Saint Thomas | VI | 802 | (340)775-3220 | N | Not Virtual | PK | 05 | Elementary | 1 | Regular School | Currently operational | 33-Town: Remote | St. Thomas Island | 425 | 425 | 0 | -1 | 22.0 | 62.0 | 67.0 | 66.0 | 75.0 | 68.0 | 68.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 252.0 | 176.0 | 428.0 | 428.0 | 34.0 | 12.59 | 1.0 | 1.0 | 2.0 | 5.0 | 4.0 | 9.0 | 201.0 | 144.0 | 345.0 | 0.0 | 0.0 | 0.0 | 37.0 | 22.0 | 59.0 | 0.0 | 1.0 | 1.0 | 8.0 | 4.0 | 12.0 | 18.354782 | -64.899024 | |
| 101388 | -64.945940 | 18.336658 | 101389 | 780003000033 | 2022-2023 | VI | 7800030 | VI-001 | Saint Thomas - Saint John School District | CANCRYN JUNIOR HIGH SCHOOL | 1 CROWN BAY | NaN | Saint Thomas | VI | 804 | (340)774-4540 | N | Not Virtual | 04 | 08 | Middle | 1 | Regular School | Currently operational | 33-Town: Remote | St. Thomas Island | 683 | 683 | 0 | -1 | NaN | NaN | NaN | NaN | NaN | 77.0 | 119.0 | 96.0 | 189.0 | 205.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 361.0 | 325.0 | 686.0 | 686.0 | 62.0 | 11.06 | 0.0 | 0.0 | 0.0 | 2.0 | 2.0 | 4.0 | 279.0 | 250.0 | 529.0 | 0.0 | 0.0 | 0.0 | 74.0 | 62.0 | 136.0 | 0.0 | 1.0 | 1.0 | 6.0 | 10.0 | 16.0 | 18.336658 | -64.945940 | |
| 101389 | -64.890311 | 18.318230 | 101390 | 780003000034 | 2022-2023 | VI | 7800030 | VI-001 | Saint Thomas - Saint John School District | BERTHA BOSCHULTE JUNIOR HIGH | 9 1 and 12A BOVONI | NaN | Saint Thomas | VI | 802 | (340)775-4222 | N | Not Virtual | 06 | 08 | Middle | 1 | Regular School | Currently operational | 33-Town: Remote | St. Thomas Island | 504 | 504 | 0 | -1 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 145.0 | 169.0 | 193.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 279.0 | 228.0 | 507.0 | 507.0 | 49.0 | 10.35 | 0.0 | 0.0 | 0.0 | 2.0 | 1.0 | 3.0 | 250.0 | 204.0 | 454.0 | 0.0 | 0.0 | 0.0 | 27.0 | 21.0 | 48.0 | 0.0 | 0.0 | 0.0 | 0.0 | 2.0 | 2.0 | 18.318230 | -64.890311 |
psChar_23.shape
(101390, 77)
- The dataframe has 101,390 rows of data.
- The dataframe has 77 columns or features.
- There are 6,894,520 total datapoints observed in the dataset.
psChar_23.info(show_counts=True, verbose=True)
<class 'pandas.core.frame.DataFrame'> RangeIndex: 101390 entries, 0 to 101389 Data columns (total 77 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 X 101390 non-null float64 1 Y 101390 non-null float64 2 OBJECTID 101390 non-null int64 3 NCESSCH 101390 non-null int64 4 SURVYEAR 101390 non-null object 5 STABR 101390 non-null object 6 LEAID 101390 non-null int64 7 ST_LEAID 101390 non-null object 8 LEA_NAME 101390 non-null object 9 SCH_NAME 101390 non-null object 10 LSTREET1 101389 non-null object 11 LSTREET2 572 non-null object 12 LCITY 101390 non-null object 13 LSTATE 101390 non-null object 14 LZIP 101390 non-null int64 15 LZIP4 101390 non-null object 16 PHONE 101390 non-null object 17 CHARTER_TEXT 101390 non-null object 18 VIRTUAL 101390 non-null object 19 GSLO 101390 non-null object 20 GSHI 101390 non-null object 21 SCHOOL_LEVEL 101390 non-null object 22 STATUS 101390 non-null int64 23 SCHOOL_TYPE_TEXT 101390 non-null object 24 SY_STATUS_TEXT 101390 non-null object 25 ULOCALE 101390 non-null object 26 NMCNTY 101390 non-null object 27 TOTFRL 101390 non-null int64 28 FRELCH 101390 non-null int64 29 REDLCH 101390 non-null int64 30 DIRECTCERT 101390 non-null int64 31 PK 32392 non-null float64 32 KG 54061 non-null float64 33 G01 54412 non-null float64 34 G02 54469 non-null float64 35 G03 54459 non-null float64 36 G04 54258 non-null float64 37 G05 53014 non-null float64 38 G06 38023 non-null float64 39 G07 33224 non-null float64 40 G08 33492 non-null float64 41 G09 28101 non-null float64 42 G10 27889 non-null float64 43 G11 27888 non-null float64 44 G12 27816 non-null float64 45 G13 133 non-null float64 46 UG 7889 non-null float64 47 AE 183 non-null float64 48 TOTMENROL 98910 non-null float64 49 TOTFENROL 98910 non-null float64 50 TOTAL 99719 non-null float64 51 MEMBER 99719 non-null float64 52 FTE 97537 non-null float64 53 STUTERATIO 99576 non-null float64 54 AMALM 98809 non-null float64 55 AMALF 98811 non-null float64 56 AM 98857 non-null float64 57 ASALM 98898 non-null float64 58 ASALF 98900 non-null float64 59 AS 98906 non-null float64 60 BLALM 98896 non-null float64 61 BLALF 98893 non-null float64 62 BL 98903 non-null float64 63 HPALM 98782 non-null float64 64 HPALF 98783 non-null float64 65 HP 98829 non-null float64 66 HIALM 98909 non-null float64 67 HIALF 98910 non-null float64 68 HI 98910 non-null float64 69 TRALM 98903 non-null float64 70 TRALF 98905 non-null float64 71 TR 98906 non-null float64 72 WHALM 98909 non-null float64 73 WHALF 98909 non-null float64 74 WH 98910 non-null float64 75 LATCOD 101390 non-null float64 76 LONCOD 101390 non-null float64 dtypes: float64(48), int64(9), object(20) memory usage: 59.6+ MB
ps23Cols = psChar_23.columns
ps23Cols
Index(['X', 'Y', 'OBJECTID', 'NCESSCH', 'SURVYEAR', 'STABR', 'LEAID',
'ST_LEAID', 'LEA_NAME', 'SCH_NAME', 'LSTREET1', 'LSTREET2', 'LCITY',
'LSTATE', 'LZIP', 'LZIP4', 'PHONE', 'CHARTER_TEXT', 'VIRTUAL', 'GSLO',
'GSHI', 'SCHOOL_LEVEL', 'STATUS', 'SCHOOL_TYPE_TEXT', 'SY_STATUS_TEXT',
'ULOCALE', 'NMCNTY', 'TOTFRL', 'FRELCH', 'REDLCH', 'DIRECTCERT', 'PK',
'KG', 'G01', 'G02', 'G03', 'G04', 'G05', 'G06', 'G07', 'G08', 'G09',
'G10', 'G11', 'G12', 'G13', 'UG', 'AE', 'TOTMENROL', 'TOTFENROL',
'TOTAL', 'MEMBER', 'FTE', 'STUTERATIO', 'AMALM', 'AMALF', 'AM', 'ASALM',
'ASALF', 'AS', 'BLALM', 'BLALF', 'BL', 'HPALM', 'HPALF', 'HP', 'HIALM',
'HIALF', 'HI', 'TRALM', 'TRALF', 'TR', 'WHALM', 'WHALF', 'WH', 'LATCOD',
'LONCOD'],
dtype='object')
psChar_23 = psChar_23.rename(columns = {'OBJECTID':'ObjectID','NCESSCH':'NCESID','SURVYEAR':'SurveyYear',
'STABR':'StateABR','LEA_NAME':'LEAname','SCH_NAME':'SchoolName',
'LSTREET1':'Street1','LSTREET2':'Street2','LCITY':'City',
'LSTATE':'State','LZIP':'Zip','LZIP4':'Zip4',
'PHONE':'Phone', 'CHARTER_TEXT':'Charter', 'VIRTUAL':'Virtual',
'GSLO':'LowestGrade','GSHI':'HighestGrade',
'SCHOOL_LEVEL':'SchoolLevel',
'STATUS':'Status', 'SCHOOL_TYPE_TEXT':'SchoolType',
'SY_STATUS_TEXT':'Status_Text',
'ULOCALE':'Locale', 'NMCNTY':'County',
'TOTFRL':'TotalFreeLunch',
'FRELCH':'FreeLunch', 'REDLCH':'ReducedLunch',
'DIRECTCERT':'MealProgramCertified', 'PK':'PreK',
'KG':'Kindergarten', 'G01':'Grade1', 'G02':'Grade2',
'G03':'Grade3', 'G04':'Grade4', 'G05':'Grade5',
'G06':'Grade6', 'G07':'Grade7', 'G08':'Grade8',
'G09':'Grade9','G10':'Grade10', 'G11':'Grade11',
'G12':'Grade12','G13':'Grade13', 'UG':'Ungraded',
'AE':'AdultEd', 'TOTMENROL':'TotMaleEnrollment',
'TOTFENROL':'TotFemaleEnrollment','TOTAL':'TotalEnrollment',
'MEMBER':'Member', 'FTE':'StaffFTE', 'STUTERATIO':'StudentTeacherRatio',
'AMALM':'AIANMale','AMALF':'AIANFem', 'AM':'AIANTotal',
'ASALM':'AsianMale', 'ASALF':'AsianFemale', 'AS':'AsianTotal',
'BLALM':'BlackMale','BLALF':'BlackFemale', 'BL':'BlackTotal',
'HPALM':'HPIMale', 'HPALF':'HPIFemale', 'HP':'HPITotal',
'HIALM':'HispanicMale','HIALF':'HispanicFemale', 'HI':'HispanicTotal',
'TRALM':'TRMale', 'TRALF':'TRFemale', 'TR':'TRTotal',
'WHALM':'WhiteMale','WHALF':'WhiteFemale', 'WH':'WhiteTotal',
'LATCOD':'Latitude','LONCOD':'Longitude'})
ps23Cols = psChar_23.columns
psChar_23.head()
| X | Y | ObjectID | NCESID | SurveyYear | StateABR | LEAID | ST_LEAID | LEAname | SchoolName | Street1 | Street2 | City | State | Zip | Zip4 | Phone | Charter | Virtual | LowestGrade | HighestGrade | SchoolLevel | Status | SchoolType | Status_Text | Locale | County | TotalFreeLunch | FreeLunch | ReducedLunch | MealProgramCertified | PreK | Kindergarten | Grade1 | Grade2 | Grade3 | Grade4 | Grade5 | Grade6 | Grade7 | Grade8 | Grade9 | Grade10 | Grade11 | Grade12 | Grade13 | Ungraded | AdultEd | TotMaleEnrollment | TotFemaleEnrollment | TotalEnrollment | Member | StaffFTE | StudentTeacherRatio | AIANMale | AIANFem | AIANTotal | AsianMale | AsianFemale | AsianTotal | BlackMale | BlackFemale | BlackTotal | HPIMale | HPIFemale | HPITotal | HispanicMale | HispanicFemale | HispanicTotal | TRMale | TRFemale | TRTotal | WhiteMale | WhiteFemale | WhiteTotal | Latitude | Longitude | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | -86.206200 | 34.2602 | 1 | 10000500870 | 2022-2023 | AL | 100005 | AL-101 | Albertville City | Albertville Middle School | 600 E Alabama Ave | NaN | Albertville | AL | 35950 | (256)878-2341 | No | Not Virtual | 07 | 08 | Middle | 1 | Regular School | Currently operational | 32-Town: Distant | Marshall County | 697 | 654 | 43 | 587 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 440.0 | 450.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 459.0 | 431.0 | 890.0 | 890.0 | 45.000000 | 19.78 | 4.0 | 1.0 | 5.0 | 4.0 | 2.0 | 6.0 | 15.0 | 14.0 | 29.0 | 0.0 | 1.0 | 1.0 | 251.0 | 251.0 | 502.0 | 17.0 | 15.0 | 32.0 | 168.0 | 147.0 | 315.0 | 34.2602 | -86.206200 | |
| 1 | -86.204900 | 34.2622 | 2 | 10000500871 | 2022-2023 | AL | 100005 | AL-101 | Albertville City | Albertville High School | 402 E McCord Ave | NaN | Albertville | AL | 35950 | 2322 | (256)894-5000 | No | Not Virtual | 09 | 12 | High | 1 | Regular School | Currently operational | 32-Town: Distant | Marshall County | 1254 | 1178 | 76 | 1059 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 493.0 | 442.0 | 390.0 | 387.0 | NaN | NaN | NaN | 868.0 | 844.0 | 1712.0 | 1712.0 | 85.199997 | 20.09 | 0.0 | 2.0 | 2.0 | 4.0 | 5.0 | 9.0 | 23.0 | 34.0 | 57.0 | 0.0 | 0.0 | 0.0 | 490.0 | 468.0 | 958.0 | 26.0 | 19.0 | 45.0 | 325.0 | 316.0 | 641.0 | 34.2622 | -86.204900 |
| 2 | -86.220100 | 34.2733 | 3 | 10000500879 | 2022-2023 | AL | 100005 | AL-101 | Albertville City | Albertville Intermediate School | 901 W McKinney Ave | NaN | Albertville | AL | 35950 | 1300 | (256)878-7698 | No | Not Virtual | 05 | 06 | Middle | 1 | Regular School | Currently operational | 32-Town: Distant | Marshall County | 718 | 665 | 53 | 570 | NaN | NaN | NaN | NaN | NaN | NaN | 412.0 | 462.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 451.0 | 423.0 | 874.0 | 874.0 | 43.000000 | 20.33 | 1.0 | 4.0 | 5.0 | 4.0 | 0.0 | 4.0 | 22.0 | 28.0 | 50.0 | 0.0 | 0.0 | 0.0 | 263.0 | 241.0 | 504.0 | 7.0 | 6.0 | 13.0 | 154.0 | 144.0 | 298.0 | 34.2733 | -86.220100 |
| 3 | -86.221806 | 34.2527 | 4 | 10000500889 | 2022-2023 | AL | 100005 | AL-101 | Albertville City | Albertville Elementary School | 145 West End Drive | NaN | Albertville | AL | 35950 | (256)894-4822 | No | Not Virtual | 03 | 04 | Elementary | 1 | Regular School | Currently operational | 32-Town: Distant | Marshall County | 723 | 680 | 43 | 583 | NaN | NaN | NaN | NaN | 430.0 | 444.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 463.0 | 411.0 | 874.0 | 874.0 | 43.000000 | 20.33 | 0.0 | 4.0 | 4.0 | 1.0 | 3.0 | 4.0 | 22.0 | 16.0 | 38.0 | 0.0 | 0.0 | 0.0 | 261.0 | 236.0 | 497.0 | 11.0 | 16.0 | 27.0 | 168.0 | 136.0 | 304.0 | 34.2527 | -86.221806 | |
| 4 | -86.193300 | 34.2898 | 5 | 10000501616 | 2022-2023 | AL | 100005 | AL-101 | Albertville City | Albertville Kindergarten and PreK | 257 Country Club Rd | NaN | Albertville | AL | 35951 | 3927 | (256)878-7922 | No | Not Virtual | PK | KG | Elementary | 1 | Regular School | Currently operational | 32-Town: Distant | Marshall County | 392 | 367 | 25 | 240 | 133.0 | 473.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 304.0 | 302.0 | 606.0 | 606.0 | 26.000000 | 23.31 | 1.0 | 3.0 | 4.0 | 2.0 | 0.0 | 2.0 | 26.0 | 23.0 | 49.0 | 0.0 | 0.0 | 0.0 | 167.0 | 152.0 | 319.0 | 4.0 | 4.0 | 8.0 | 104.0 | 120.0 | 224.0 | 34.2898 | -86.193300 |
psChar_23.isnull().sum()
X 0 Y 0 ObjectID 0 NCESID 0 SurveyYear 0 StateABR 0 LEAID 0 ST_LEAID 0 LEAname 0 SchoolName 0 Street1 1 Street2 100818 City 0 State 0 Zip 0 Zip4 0 Phone 0 Charter 0 Virtual 0 LowestGrade 0 HighestGrade 0 SchoolLevel 0 Status 0 SchoolType 0 Status_Text 0 Locale 0 County 0 TotalFreeLunch 0 FreeLunch 0 ReducedLunch 0 MealProgramCertified 0 PreK 68998 Kindergarten 47329 Grade1 46978 Grade2 46921 Grade3 46931 Grade4 47132 Grade5 48376 Grade6 63367 Grade7 68166 Grade8 67898 Grade9 73289 Grade10 73501 Grade11 73502 Grade12 73574 Grade13 101257 Ungraded 93501 AdultEd 101207 TotMaleEnrollment 2480 TotFemaleEnrollment 2480 TotalEnrollment 1671 Member 1671 StaffFTE 3853 StudentTeacherRatio 1814 AIANMale 2581 AIANFem 2579 AIANTotal 2533 AsianMale 2492 AsianFemale 2490 AsianTotal 2484 BlackMale 2494 BlackFemale 2497 BlackTotal 2487 HPIMale 2608 HPIFemale 2607 HPITotal 2561 HispanicMale 2481 HispanicFemale 2480 HispanicTotal 2480 TRMale 2487 TRFemale 2485 TRTotal 2484 WhiteMale 2481 WhiteFemale 2481 WhiteTotal 2480 Latitude 0 Longitude 0 dtype: int64
def missing(DataFrame):
print('Percentage of missing values in the dataset:\n',
round((DataFrame.isnull().sum() *100/len(DataFrame)), 2).sort_values(ascending=False))
missing(psChar_23)
Percentage of missing values in the dataset: Grade13 99.87 AdultEd 99.82 Street2 99.44 Ungraded 92.22 Grade12 72.57 Grade10 72.49 Grade11 72.49 Grade9 72.28 PreK 68.05 Grade7 67.23 Grade8 66.97 Grade6 62.50 Grade5 47.71 Kindergarten 46.68 Grade4 46.49 Grade1 46.33 Grade3 46.29 Grade2 46.28 StaffFTE 3.80 HPIFemale 2.57 HPIMale 2.57 AIANMale 2.55 AIANFem 2.54 HPITotal 2.53 AIANTotal 2.50 AsianMale 2.46 BlackMale 2.46 AsianFemale 2.46 BlackFemale 2.46 WhiteFemale 2.45 WhiteTotal 2.45 TRTotal 2.45 AsianTotal 2.45 BlackTotal 2.45 HispanicMale 2.45 HispanicFemale 2.45 HispanicTotal 2.45 WhiteMale 2.45 TotFemaleEnrollment 2.45 TRFemale 2.45 TRMale 2.45 TotMaleEnrollment 2.45 StudentTeacherRatio 1.79 TotalEnrollment 1.65 Member 1.65 City 0.00 Street1 0.00 SchoolName 0.00 LEAname 0.00 LEAID 0.00 ST_LEAID 0.00 StateABR 0.00 SurveyYear 0.00 X 0.00 NCESID 0.00 ObjectID 0.00 Y 0.00 ReducedLunch 0.00 MealProgramCertified 0.00 TotalFreeLunch 0.00 FreeLunch 0.00 Zip 0.00 State 0.00 Zip4 0.00 Phone 0.00 Charter 0.00 Virtual 0.00 LowestGrade 0.00 HighestGrade 0.00 SchoolLevel 0.00 Status 0.00 SchoolType 0.00 Status_Text 0.00 Locale 0.00 County 0.00 Latitude 0.00 Longitude 0.00 dtype: float64
Observations¶
A total of eighteen columns have missing value percentages above forty-five percent. For the 'Grade' columns, this could be explained because this dataset includes schools at various education levels, meaning some schools might not offer certain grade levels. Furthermore, there are many missing values specifically for the columns regarding free/reduced lunch and the student to teacher ratio. As indicated in the description of this dataset online, these missing values are represented by a number of indicators: -1 indicates that data is missing, -2 or N indicates that data is not applicable, and -9 indicates that data did not meet NCES data quality standards. Given this information, I would drop the AdultEd and Grade13 columns, as this research is focused only on youth sports participation in traditional public schools. I would also drop columns 'Phone', 'LEAName', 'LEADID', 'ST_LEAID', 'SurveyYear', 'StaffFTE', 'Member', and 'NCESID', as they are not necessary for analysis. I also plan to remove the columns with negative values.
dropCols = ['AdultEd','Phone','LEAname','LEAID','ST_LEAID','SurveyYear','StaffFTE','Member','NCESID','Grade13']
psChar_23 = psChar_23.drop(columns=dropCols)
psChar_23
psChar_23.isnull().sum()
X 0 Y 0 ObjectID 0 StateABR 0 SchoolName 0 Street1 1 Street2 100818 City 0 State 0 Zip 0 Zip4 0 Charter 0 Virtual 0 LowestGrade 0 HighestGrade 0 SchoolLevel 0 Status 0 SchoolType 0 Status_Text 0 Locale 0 County 0 TotalFreeLunch 0 FreeLunch 0 ReducedLunch 0 MealProgramCertified 0 PreK 68998 Kindergarten 47329 Grade1 46978 Grade2 46921 Grade3 46931 Grade4 47132 Grade5 48376 Grade6 63367 Grade7 68166 Grade8 67898 Grade9 73289 Grade10 73501 Grade11 73502 Grade12 73574 Ungraded 93501 TotMaleEnrollment 2480 TotFemaleEnrollment 2480 TotalEnrollment 1671 StudentTeacherRatio 1814 AIANMale 2581 AIANFem 2579 AIANTotal 2533 AsianMale 2492 AsianFemale 2490 AsianTotal 2484 BlackMale 2494 BlackFemale 2497 BlackTotal 2487 HPIMale 2608 HPIFemale 2607 HPITotal 2561 HispanicMale 2481 HispanicFemale 2480 HispanicTotal 2480 TRMale 2487 TRFemale 2485 TRTotal 2484 WhiteMale 2481 WhiteFemale 2481 WhiteTotal 2480 Latitude 0 Longitude 0 dtype: int64
psChar_23["Status_Text"].unique() #check to see if the schools are operational
psChar_23 = psChar_23[psChar_23["Status_Text"].str.contains(
"School to be operational within two years|School temporarily closed", na=False) ==False]
psChar_23["SchoolType"].unique() #check to see the types of schools listed in the dataset, only looking at traditional schools so we can cut the others out
psChar_23 = psChar_23[psChar_23["SchoolType"].str.contains(
"Regular School", na=False)]
The history saving thread hit an unexpected error (OperationalError('attempt to write a readonly database')).History will not be written to the database.
# filter out negative FRPL (free and reduced price lunch) values & student teacher ratios
negativeCols = ['ReducedLunch', 'MealProgramCertified','FreeLunch','StudentTeacherRatio']
psChar_23 = psChar_23[(psChar_23[negativeCols] >= 0).all(axis=1)]
psChar_23.shape
(37392, 67)
psChar_23['Locale'].unique()
array(['32-Town: Distant', '42-Rural: Distant', '41-Rural: Fringe',
'13-City: Small', '21-Suburb: Large', '33-Town: Remote',
'31-Town: Fringe', '23-Suburb: Small', '12-City: Mid-size',
'43-Rural: Remote', '22-Suburb: Mid-size', '11-City: Large'],
dtype=object)
Locale = {'42-Rural: Distant':'Rural',
'41-Rural: Fringe':'Rural',
'43-Rural: Remote':'Rural',
'32-Town: Distant':'Town',
'33-Town: Remote':'Town',
'31-Town: Fringe':'Town',
'13-City: Small':'City',
'12-City: Mid-size':'City',
'11-City: Large':'City',
'21-Suburb: Large':'Suburb',
'23-Suburb: Small':'Suburb',
'22-Suburb: Mid-size':'Suburb'}
Locale
{'42-Rural: Distant': 'Rural',
'41-Rural: Fringe': 'Rural',
'43-Rural: Remote': 'Rural',
'32-Town: Distant': 'Town',
'33-Town: Remote': 'Town',
'31-Town: Fringe': 'Town',
'13-City: Small': 'City',
'12-City: Mid-size': 'City',
'11-City: Large': 'City',
'21-Suburb: Large': 'Suburb',
'23-Suburb: Small': 'Suburb',
'22-Suburb: Mid-size': 'Suburb'}
psChar_23['Locale'] = psChar_23['Locale'].map(Locale)
psChar_23['Locale'].unique()
array(['Town', 'Rural', 'City', 'Suburb'], dtype=object)
psChar_23.describe()
| X | Y | ObjectID | Zip | Status | TotalFreeLunch | FreeLunch | ReducedLunch | MealProgramCertified | PreK | Kindergarten | Grade1 | Grade2 | Grade3 | Grade4 | Grade5 | Grade6 | Grade7 | Grade8 | Grade9 | Grade10 | Grade11 | Grade12 | Ungraded | TotMaleEnrollment | TotFemaleEnrollment | TotalEnrollment | StudentTeacherRatio | AIANMale | AIANFem | AIANTotal | AsianMale | AsianFemale | AsianTotal | BlackMale | BlackFemale | BlackTotal | HPIMale | HPIFemale | HPITotal | HispanicMale | HispanicFemale | HispanicTotal | TRMale | TRFemale | TRTotal | WhiteMale | WhiteFemale | WhiteTotal | Latitude | Longitude | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| count | 37392.000000 | 37392.000000 | 37392.000000 | 37392.000000 | 37392.000000 | 37392.000000 | 37392.000000 | 37392.000000 | 37392.000000 | 11978.000000 | 22550.000000 | 22629.000000 | 22642.000000 | 22602.000000 | 22538.000000 | 22257.000000 | 14797.000000 | 11647.000000 | 11603.000000 | 8163.000000 | 8083.000000 | 8065.000000 | 8051.000000 | 1952.000000 | 36722.000000 | 36722.000000 | 37392.000000 | 37392.000000 | 36672.000000 | 36670.000000 | 36700.000000 | 36717.000000 | 36720.000000 | 36722.000000 | 36713.000000 | 36711.000000 | 36717.000000 | 36661.000000 | 36663.000000 | 36689.000000 | 36722.000000 | 36722.000000 | 36722.000000 | 36719.000000 | 36720.000000 | 36720.000000 | 36722.000000 | 36722.000000 | 36722.000000 | 37392.000000 | 37392.000000 |
| mean | -100.251468 | 37.290953 | 36287.616683 | 63446.899497 | 1.014549 | 329.526610 | 294.008478 | 35.518132 | 211.785756 | 32.782017 | 72.719335 | 71.254938 | 69.543636 | 71.746350 | 71.135815 | 72.845082 | 110.678448 | 141.237400 | 144.294062 | 223.755115 | 217.416924 | 200.690763 | 191.246429 | 5.592725 | 298.766843 | 283.438457 | 582.363982 | 17.143016 | 3.452471 | 3.328279 | 6.775395 | 18.924422 | 17.763154 | 36.684031 | 45.420178 | 43.891340 | 89.299398 | 1.804724 | 1.707362 | 3.509499 | 90.627880 | 86.666930 | 177.294810 | 16.256734 | 15.626416 | 31.882707 | 122.303170 | 114.477398 | 236.780568 | 37.290953 | -100.251468 |
| std | 19.640040 | 6.016159 | 29092.351157 | 28541.959418 | 0.193091 | 302.519034 | 276.192414 | 57.319530 | 205.370496 | 38.554464 | 43.464381 | 41.309698 | 40.438018 | 41.754483 | 42.033706 | 46.943546 | 107.750815 | 131.743413 | 135.120098 | 228.483527 | 214.329761 | 199.781788 | 191.818897 | 9.279454 | 244.643853 | 236.573281 | 478.938331 | 13.327592 | 16.876568 | 16.257685 | 33.002197 | 51.673689 | 48.952495 | 100.345711 | 81.985733 | 80.888984 | 162.123805 | 10.492019 | 9.835723 | 20.222761 | 136.744539 | 131.278507 | 267.380097 | 19.221038 | 18.730197 | 37.461901 | 131.928343 | 126.522191 | 257.671934 | 6.016159 | 19.640040 |
| min | -171.715402 | 14.140873 | 1.000000 | 3901.000000 | 1.000000 | 3.000000 | 0.000000 | 0.000000 | 3.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 9.000000 | 0.610000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 14.140873 | -171.715402 |
| 25% | -118.201758 | 33.753817 | 11736.500000 | 34249.000000 | 1.000000 | 136.000000 | 115.000000 | 5.000000 | 73.000000 | 10.000000 | 45.000000 | 45.000000 | 44.000000 | 45.000000 | 44.000000 | 44.000000 | 36.000000 | 34.000000 | 35.000000 | 44.000000 | 44.000000 | 42.000000 | 40.000000 | 0.000000 | 157.000000 | 148.000000 | 308.000000 | 13.490000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 1.000000 | 2.000000 | 1.000000 | 3.000000 | 0.000000 | 0.000000 | 0.000000 | 11.000000 | 11.000000 | 22.000000 | 4.000000 | 3.000000 | 7.000000 | 26.000000 | 24.000000 | 50.000000 | 33.753816 | -118.201758 |
| 50% | -93.889011 | 36.964123 | 26215.500000 | 64014.000000 | 1.000000 | 261.000000 | 230.000000 | 20.000000 | 158.000000 | 24.000000 | 69.000000 | 68.000000 | 66.000000 | 68.000000 | 67.000000 | 68.000000 | 73.000000 | 95.000000 | 97.000000 | 135.000000 | 132.000000 | 120.000000 | 113.000000 | 2.000000 | 244.000000 | 230.000000 | 474.000000 | 16.140000 | 1.000000 | 0.000000 | 1.000000 | 3.000000 | 3.000000 | 6.000000 | 11.000000 | 10.000000 | 21.000000 | 0.000000 | 0.000000 | 0.000000 | 40.000000 | 38.000000 | 78.000000 | 11.000000 | 10.000000 | 22.000000 | 89.000000 | 82.000000 | 171.000000 | 36.964123 | -93.889011 |
| 75% | -84.473956 | 39.752610 | 51456.250000 | 92405.000000 | 1.000000 | 427.000000 | 387.000000 | 45.000000 | 285.000000 | 43.000000 | 95.000000 | 92.000000 | 90.000000 | 93.000000 | 92.000000 | 94.000000 | 150.000000 | 228.000000 | 232.000000 | 373.000000 | 361.000000 | 330.000000 | 310.000000 | 8.000000 | 358.000000 | 338.000000 | 694.000000 | 20.000000 | 2.000000 | 2.000000 | 3.000000 | 14.000000 | 13.000000 | 27.000000 | 54.000000 | 52.000000 | 106.000000 | 1.000000 | 1.000000 | 2.000000 | 118.000000 | 114.000000 | 233.000000 | 22.000000 | 21.000000 | 44.000000 | 172.000000 | 160.000000 | 332.000000 | 39.752610 | -84.473956 |
| max | 145.784430 | 71.298478 | 100508.000000 | 99950.000000 | 8.000000 | 5770.000000 | 5563.000000 | 1400.000000 | 2921.000000 | 903.000000 | 873.000000 | 646.000000 | 665.000000 | 691.000000 | 669.000000 | 727.000000 | 923.000000 | 844.000000 | 930.000000 | 6251.000000 | 2855.000000 | 1293.000000 | 1339.000000 | 223.000000 | 4352.000000 | 4524.000000 | 8876.000000 | 1860.000000 | 585.000000 | 513.000000 | 1098.000000 | 1335.000000 | 1224.000000 | 2559.000000 | 2195.000000 | 2207.000000 | 4402.000000 | 556.000000 | 440.000000 | 996.000000 | 1947.000000 | 2118.000000 | 4065.000000 | 436.000000 | 422.000000 | 828.000000 | 1989.000000 | 2305.000000 | 4294.000000 | 71.298478 | 145.784430 |
psChar_23og = psChar_23
Observations of Descriptive Statistics¶
(Min, Max):
- TotalFreeLunch (3, 5770); FreeLunch (0, 5563); ReducedLunch (0, 1400); MealProgramCertified (3, 2921)
- PreK (0, 903)
- Kindergarten (0, 873)
- Grade1 (0, 646)
- Grade2 (0, 665)
- Grade3 (0, 691)
- Grade4 (0, 669)
- Grade5 (0, 727)
- Grade6 (0, 923)
- Grade7 (0, 844)
- Grade8 (0, 930)
- Grade9 (0, 6251)
- Grade10 (0, 2855)
- Grade11 (0, 1293)
- Grade12 (0, 1339)
- Ungraded (0, 223)
- Total Male Enrollment (0, 4352); Total Female Enrollment (0, 4524); Total Enrollment (9, 8876)
- Student Teacher Ratio (0, 1860)
- American Indian/Alaskan Native Male (0, 585); American Indian/Alaskan Native Female (0, 513); American Indian/Alaskan Native Total (0, 1098)
- Asian Male (0, 1335); Asian Female (0, 1224); Asian Total (0, 2559)
- Black Male (0, 2195); Black Female (0, 2207); Black Total (0, 4402)
- Native Hawaiian/Pacific Islander(HPI) Male (0, 556); Native Hawaiian/Pacific Islander(HPI) Female (0, 440); Native Hawaiian/Pacific Islander(HPI) Total (0, 996)
- Hispanic Male (0, 1947); Hispanic Female (0, 2118); Hispanic Total (0, 4065)
- Two or More Races Male (0, 436); Two or More Races Female (0, 422); Two or More Races Total (0, 828)
- White Male (0, 1989); White Female (0, 2305); White Total (0, 4294)
Mean:
- TotalFreeLunch: 329.53; FreeLunch: 294.01; ReducedLunch: 35.52; MealProgramCertified: 211.79
- PreK: 32.78 students
- Kindergarten: 72.72 students
- Grade1: 71.25 students
- Grade2: 69.54 students
- Grade3: 71.75 students
- Grade4: 71.14 students
- Grade5: 72.85 students
- Grade6: 110.68 students
- Grade7: 141.24 students
- Grade8: 144.29 students
- Grade9: 223.76 students
- Grade10: 217.42 students
- Grade11: 200.69 students
- Grade12: 191.25 students
- Ungraded: 5.59 students
- Total Male Enrollment: 298.77 students; Total Female Enrollment: 283.44 students; Total Enrollment: 582.36 students
- Student Teacher Ratio: 17.14 students/teacher
- American Indian/Alaskan Native Male: 3.45 students; American Indian/Alaskan Native Female: 3.33 students; American Indian/Alaskan Native Total: 6.78 students
- Asian Male: 18.92 students; Asian Female 17.76 students; Asian Total: 36.68 students
- Black Male: 45.42 students; Black Female: 43.89 students; Black Total: 89.30 students
- Native Hawaiian/Pacific Islander(HPI) Male: 1.80 students; Native Hawaiian/Pacific Islander(HPI) Female: 1.71 students; Native Hawaiian/Pacific Islander(HPI) Total: 3.51 students
- Hispanic Male: 90.63 students; Hispanic Female: 86.67 students; Hispanic Total: 177.29 students
- Two or More Races Male: 16.26 students; Two or More Races Female: 15.63 students; Two or More Races Total: 31.82 students
- White Male: 122.30 students; White Female: 114.48 students; White Total: 236.78 students
Quartile Ranges (25%, 75%):
- TotalFreeLunch: (136, 427); FreeLunch: (115, 387); ReducedLunch (5, 45); MealProgramCertified: (73, 285)
- PreK: (10, 43)
- Kindergarten: (45, 95)
- Grade1: (45, 92)
- Grade2: (44, 90)
- Grade3: (45, 93)
- Grade4: (44, 92)
- Grade5: (44, 94)
- Grade6: (36, 150)
- Grade7: (34, 228)
- Grade8: (35, 232)
- Grade9: (44, 373)
- Grade10: (44, 361)
- Grade11: (42, 330)
- Grade12: (40, 310)
- Ungraded: (2, 8)
- Total Male Enrollment: (157, 358); Total Female Enrollment: (148, 338); Total Enrollment: (308, 694)
- Student Teacher Ratio: (13.49, 20)
- American Indian/Alaskan Native Male: (0, 2); American Indian/Alaskan Native Female: (0, 2); American Indian/Alaskan Native Total: (0, 3)
- Asian Male: (0, 14); Asian Female: (0, 13); Asian Total: (1, 27)
- Black Male: (1, 54); Black Female: (1, 52); Black Total: (3, 106)
- Native Hawaiian/Pacific Islander(HPI) Male: (0, 1); Native Hawaiian/Pacific Islander(HPI) Female: (0, 1); Native Hawaiian/Pacific Islander(HPI) Total: (0, 2)
- Hispanic Male: (11, 118); Hispanic Female: (11, 114); Hispanic Total: (22, 233)
- Two or More Races Male: (4, 22); Two or More Races Female: (3, 21); Two or More Races Total: (7, 44)
- White Male: (26, 172); White Female: (24, 160); White Total: (50, 332)
Standard Deviation:
Higher than mean- Reduced Lunch, PreK, Grade9, Grade12, Ungraded; all student races Lower- Total Free Lunch, Free Lunch, Meal Program Certified, all grades (except 9 and 12), Total Male Enrollment, Total Female Enrollment, Total Enrollment, Student to Teacher Ratio
FRPL rates- the std's are moderately lower than the means, excluding the std for ReducedLunch which is higher than the mean.
The standard deviations for PreK, and Grades 9 and 12, are higher than the means, while all other grades are lower.
The standard deviations for enrollment rates are lower than the means.
The standard deviation for the student to teacher ratio is lower than the mean.
The standard deviations for all student demographics are higher than the means, though the disparity found in White student demographics is much less significant compared to other races/ethnicities.
Mean/Median Closeness:
The medians for the free/reduced lunch status of the schools are lower than the means.
For the columns covering the elementary school grades, the medians are close but lower than the mean values. For the other grades, the medians are not as close, but are still lower than the means.
The median for the student-teacher ratio is close to the mean.
The median values for the Black and Hispanic student demographics are significantly lower than the mean values.
print(psChar_23["StudentTeacherRatio"].describe())
count 37392.000000 mean 17.143016 std 13.327592 min 0.610000 25% 13.490000 50% 16.140000 75% 20.000000 max 1860.000000 Name: StudentTeacherRatio, dtype: float64
psChar_23.columns
Index(['X', 'Y', 'ObjectID', 'StateABR', 'SchoolName', 'Street1', 'Street2',
'City', 'State', 'Zip', 'Zip4', 'Charter', 'Virtual', 'LowestGrade',
'HighestGrade', 'SchoolLevel', 'Status', 'SchoolType', 'Status_Text',
'Locale', 'County', 'TotalFreeLunch', 'FreeLunch', 'ReducedLunch',
'MealProgramCertified', 'PreK', 'Kindergarten', 'Grade1', 'Grade2',
'Grade3', 'Grade4', 'Grade5', 'Grade6', 'Grade7', 'Grade8', 'Grade9',
'Grade10', 'Grade11', 'Grade12', 'Ungraded', 'TotMaleEnrollment',
'TotFemaleEnrollment', 'TotalEnrollment', 'StudentTeacherRatio',
'AIANMale', 'AIANFem', 'AIANTotal', 'AsianMale', 'AsianFemale',
'AsianTotal', 'BlackMale', 'BlackFemale', 'BlackTotal', 'HPIMale',
'HPIFemale', 'HPITotal', 'HispanicMale', 'HispanicFemale',
'HispanicTotal', 'TRMale', 'TRFemale', 'TRTotal', 'WhiteMale',
'WhiteFemale', 'WhiteTotal', 'Latitude', 'Longitude'],
dtype='object')
print(type(psChar_23))
<class 'pandas.core.frame.DataFrame'>
Scatter Plots¶
import matplotlib.pyplot as plt
import numpy as np
psChar_23 = psChar_23[psChar_23["TotalFreeLunch"] <= psChar_23["TotalEnrollment"]]
psChar_23["LunchRate"] = (psChar_23["TotalFreeLunch"] / psChar_23["TotalEnrollment"]) * 100
race_colors = {"BlackTotal": "tab:blue", "HispanicTotal": "tab:orange", "WhiteTotal": "tab:green"}
size_values = {"BlackTotal": 45, "HispanicTotal": 45, "WhiteTotal": 45}
psChar_23["PredominantRace"] = psChar_23[["BlackTotal", "HispanicTotal", "WhiteTotal"]].idxmax(axis=1)
fig, ax = plt.subplots()
for race, color in race_colors.items():
subset = psChar_23[psChar_23["PredominantRace"] == race]
x = subset["LunchRate"]
y = subset["StudentTeacherRatio"]
scale = 200.0 * np.random.rand(len(subset))
ax.scatter(x, y, c=color, s=size_values[race], label=race, alpha=0.3, edgecolors='none')
ax.legend(('Predominately Black School', 'Predominately Latino/Hispanic School', 'Predominately White School'), loc='upper right', shadow=True)
ax.grid(True)
ax.set_xlabel("% of Students w/ FRPL Eligibility")
ax.set_ylabel("Students per Teacher")
ax.set_title("FRPL Eligibility & Student-Teacher Ratio (by Race)")
ax.set_xlim(0, 100)
ax.set_ylim(0, 50)
ax.set_ymargin(0.1)
ax.set_xmargin(0.1)
plt.show()
Hispanic/Latino Demographic¶
race_colors = {"BlackTotal": "tab:blue", "HispanicTotal": "tab:orange", "WhiteTotal": "tab:green"}
size_values = {"BlackTotal": 0, "HispanicTotal": 45, "WhiteTotal": 0}
psChar_23["PredominantRace"] = psChar_23[["BlackTotal", "HispanicTotal", "WhiteTotal"]].idxmax(axis=1)
fig, ax = plt.subplots()
for race, color in race_colors.items():
subset = psChar_23[psChar_23["PredominantRace"] == race]
x = subset["LunchRate"]
y = subset["StudentTeacherRatio"]
scale = 200.0 * np.random.rand(len(subset))
ax.scatter(x, y, c=color, s=size_values[race], label=race, alpha=0.3, edgecolors='none')
ax.grid(True)
ax.set_xlabel("% of Students w/ FRPL Eligibility")
ax.set_ylabel("Students per Teacher")
ax.set_title("FRPL Eligibility & Student-Teacher Ratio (Hispanic Students)")
ax.set_xlim(0, 100)
ax.set_ylim(0, 50)
ax.set_ymargin(0.1)
ax.set_xmargin(0.1)
plt.show()
Black Demographic¶
race_colors = {"BlackTotal": "tab:blue", "HispanicTotal": "tab:orange", "WhiteTotal": "tab:green"}
size_values = {"BlackTotal": 45, "HispanicTotal": 0, "WhiteTotal": 0}
psChar_23["PredominantRace"] = psChar_23[["BlackTotal", "HispanicTotal", "WhiteTotal"]].idxmax(axis=1)
fig, ax = plt.subplots()
for race, color in race_colors.items():
subset = psChar_23[psChar_23["PredominantRace"] == race]
x = subset["LunchRate"]
y = subset["StudentTeacherRatio"]
scale = 200.0 * np.random.rand(len(subset))
ax.scatter(x, y, c=color, s=size_values[race], label=race, alpha=0.3, edgecolors='none')
ax.grid(True)
ax.set_xlabel("% of Students w/ FRPL Eligibility")
ax.set_ylabel("Students per Teacher")
ax.set_title("FRPL Eligibility & Student-Teacher Ratio (Black Students)")
ax.set_xlim(0, 100)
ax.set_ylim(0, 50)
ax.set_ymargin(0.1)
ax.set_xmargin(0.1)
plt.show()
White Demographic¶
race_colors = {"BlackTotal": "tab:blue", "HispanicTotal": "tab:orange", "WhiteTotal": "tab:green"}
size_values = {"BlackTotal": 0, "HispanicTotal": 0, "WhiteTotal": 45}
psChar_23["PredominantRace"] = psChar_23[["BlackTotal", "HispanicTotal", "WhiteTotal"]].idxmax(axis=1)
fig, ax = plt.subplots()
for race, color in race_colors.items():
subset = psChar_23[psChar_23["PredominantRace"] == race]
x = subset["LunchRate"]
y = subset["StudentTeacherRatio"]
scale = 200.0 * np.random.rand(len(subset))
ax.scatter(x, y, c=color, s=size_values[race], label=race, alpha=0.3, edgecolors='none')
ax.grid(True)
ax.set_xlabel("% of Students w/ FRPL Eligibility")
ax.set_ylabel("Students per Teacher")
ax.set_title("FRPL Eligibility & Student-Teacher Ratio (White Students)")
ax.set_xlim(0, 100)
ax.set_ylim(0, 50)
ax.set_ymargin(0.1)
ax.set_xmargin(0.1)
plt.show()
Bubble Plot¶
psChar_23.head(1)
| X | Y | ObjectID | StateABR | SchoolName | Street1 | Street2 | City | State | Zip | Zip4 | Charter | Virtual | LowestGrade | HighestGrade | SchoolLevel | Status | SchoolType | Status_Text | Locale | County | TotalFreeLunch | FreeLunch | ReducedLunch | MealProgramCertified | PreK | Kindergarten | Grade1 | Grade2 | Grade3 | Grade4 | Grade5 | Grade6 | Grade7 | Grade8 | Grade9 | Grade10 | Grade11 | Grade12 | Ungraded | TotMaleEnrollment | TotFemaleEnrollment | TotalEnrollment | StudentTeacherRatio | AIANMale | AIANFem | AIANTotal | AsianMale | AsianFemale | AsianTotal | BlackMale | BlackFemale | BlackTotal | HPIMale | HPIFemale | HPITotal | HispanicMale | HispanicFemale | HispanicTotal | TRMale | TRFemale | TRTotal | WhiteMale | WhiteFemale | WhiteTotal | Latitude | Longitude | LunchRate | PredominantRace | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | -86.2062 | 34.2602 | 1 | AL | Albertville Middle School | 600 E Alabama Ave | NaN | Albertville | AL | 35950 | No | Not Virtual | 07 | 08 | Middle | 1 | Regular School | Currently operational | Town | Marshall County | 697 | 654 | 43 | 587 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 440.0 | 450.0 | NaN | NaN | NaN | NaN | NaN | 459.0 | 431.0 | 890.0 | 19.78 | 4.0 | 1.0 | 5.0 | 4.0 | 2.0 | 6.0 | 15.0 | 14.0 | 29.0 | 0.0 | 1.0 | 1.0 | 251.0 | 251.0 | 502.0 | 17.0 | 15.0 | 32.0 | 168.0 | 147.0 | 315.0 | 34.2602 | -86.2062 | 78.314607 | HispanicTotal |
import plotly.graph_objects as go
import plotly.express as px
import pandas as pd
import math
sample_size = 1000
sample = psChar_23.sample(n=sample_size, random_state=1)
hover_text = []
bubble_size = []
for index, row in sample.iterrows():
hover_text.append(('School: {SchoolName}<br>'+
'Lunch Rate: {LunchRate:.2f}<br>'+
'Students per Teacher: {StudentTeacherRatio}<br>'+
'Total Enrollment: {TotalEnrollment}<br>').format(SchoolName=row['SchoolName'],
LunchRate=row['LunchRate'],
StudentTeacherRatio=row['StudentTeacherRatio'],
TotalEnrollment=row['TotalEnrollment']))
bubble_size.append(math.sqrt(row['TotalEnrollment']))
sample['text'] = hover_text
sample['size'] = bubble_size
sizeref = 2.*max(sample['size'])/(25**2)
race_categories = ['BlackTotal', 'HispanicTotal', 'WhiteTotal']
race_data = {race: sample[sample["PredominantRace"] == race] for race in race_categories}
fig = go.Figure()
for race, subset in race_data.items():
fig.add_trace(go.Scatter(
x=subset["LunchRate"],
y=subset["StudentTeacherRatio"],
name=race,
text=subset["text"],
marker_size=subset['size'],
))
fig.update_traces(mode='markers', marker=dict(sizemode='area',
sizeref=sizeref, line_width=2))
fig.update_layout(
title="FRPL Eligibility & Student-Teacher Ratio",
xaxis=dict(title="% of Students w/ FRPL Eligibility", gridcolor='white', gridwidth=2),
yaxis=dict(title="Students per Teacher", gridcolor='white', gridwidth=2, range=[0, 50],
dtick=20),
paper_bgcolor='rgb(243, 243, 243)',
plot_bgcolor='rgb(243, 243, 243)',
)
fig.show()